University of Konstanz - Using Visual Analytics to Uncover the Course of a Pandemic

VAST 2010 Challenge

Grand Challenge: Arms Dealing and Pandemic

Authors and Affiliations:

Peter Bak, University of Konstanz, bak@dbvis.inf.uni-konstanz.de [PRIMARY contact]
Christian Rohrdantz, University of Konstanz, christian.rohrdantz@uni-konstanz.de
Curran Kelleher, University of Konstanz, kelleher@dbvis.inf.uni-konstanz.de

Tool(s):

We used many freely available tools for analysis and visualization:

Several small programs were created by our students for task specific data preprocessing and visualization activities:

 

Video:

 

Link: Our submission video

 

ANSWERS:


Introduction
The 2010 VAST Challenge data set has three parts: textual arms dealing intelligence, daily hospitalization records across many countries, and a set of genome sequences for the deadly Drafa Virus. In the current debrief, we aim to synthesize the knowledge we have gleaned from all these data sources into one detailed description of how we understand the situation. For this purpose, we first target each of the data sources separately, then use the collected pieces of information for describing the larger story.

After detailed analysis of all data sources, we conclude that a series of meetings, which took place in Dubai between April 15th and 21st, is at the crux of the pandemic. The participants in these meetings, mostly arms dealers, were infected by one of their accomplices and carried the virus to a number of countries, causing a pandemic outbreak and the death of millions. In the following we will present the collection of facts leading to our description of the events and interpretations. Further, we will define a set of actions that should be taken to answer some of the open questions still remaining. The rest of the document is organized by the three very specific tasks of the Grand Challenge, starting with a brief overview of the general methodology applied for all tasks.

Methodology
All the results presented are the outcome of student projects from a course held at University of Konstanz dedicated to the VAST Challenge. Small student groups were assigned to each mini challenge. Students had to analyze the data provided and answer the tasks of their mini challenge as part of the course requirements. The methodologies applied for all mini challenges used a tight integration of automatic information extraction with interactive visualization. The application of visualizations was twofold: they were used to generate new hypotheses as well as for exploring and confirming the output of automated algorithms. In order to extract the required information from the provided data, it was crucial to introduce visualization at all stages of the analytic process, rather than to design more sophisticated and innovative ways of visualizing the underlying data. In conclusion, the contribution of the current work is in its methodology; combining automatic algorithms and visualization, rather than in the uniqueness of the techniques. Linkage between the arms dealing activity and the pandemic outbreak

Analysis
We have identified a series of meetings in Dubai between many illegal arms dealers as a key element in the development of the Drafa Virus pandemic. A Russian arms dealer, Mikhail Dombrovski, had planned to meet Dr. George Ngoki (referred to as “Dr. George”) on April 15th in Dubai. Dr. George is from Nigeria, which was confirmed to be the origin of the virus by our investigation of Mini Challenge 3. We believe the virus was initially transferred from Dr. George to Dombrovski at their meeting on the 15th, then subsequently spread through the arms dealing network active in Dubai at that time. We assume that the people at the meetings returned to their home countries afterwards, and that these infections were the first of the pandemic in these countries.


Figure 1: A representation of planned meetings in Dubai between the 15th and 21st of April, 2009. Rows represent the participants in the meetings (individual name and country of origin). Columns represent days of April in which the meetings took place. Color represents infection status.


Drawing from the remaining collection of meeting plans discovered in the document collection, we attempted to construct the complete pathway of people through which the virus spread. To construct this pathway we used
information about planned meetings found in the intelligence documents (we made the simplifying assumption that all meetings took place as planned). Figure 1 expresses this pathway visually. Each circle represents an appearance of a person at a meeting. Red represents infection, green represents non-infection, and orange represents likely infection. We are fairly certain that after the initial transfer from Dr. George to Dombrovski, Dombrovski infected Nicolai and Saleh Ahmed, Ahmed infected “Brother Haik”, and Nicolai infected Igor. All of these people are known to be part of an illegal arms dealing network.

We hypothesize that the participants from Lebanon and Kenya were also infected at some point in Dubai, because those countries experienced the viral pandemic. These people may have had undocumented meetings with infected individuals in Dubai. However, it is also possible that the virus was spread to these countries through other paths. We do not have enough information to make a conclusion with certainty.


Figure 2: A visualization of a phylogenetic tree computed from the viral sequence data. Distance from the center corresponds roughly to edit distance of the sequence from the original strain, Nigeria B. Color represents severity of the symptoms caused by the strain.


In addition to the story in Dubai, phylogenetic analysis of the viral sequence data reveals that Nicolai was a carrier of a strain which evolved to become very deadly. Figure 2 visualizes the results of the phylogenetic analysis (the tree structure), along with the severity of each strain (color). The strain carried by Nicolai appears in the figure as strain number 583. This strain itself is quite severe, and its offspring strains (nodes branching from node 583 away from the center of the figure) are very severe as well. This supports our hypothesis that Nicolai was the one who infected others with the strain which evolved into the most deadly variants of the virus, spreading through arms dealers in Dubai and eventually causing the pandemic.


Figure 3: Temporal patterns of hospitalization and death among countries. Time is represented on the x-axes, and the y-axes the normalized number of patients in each country. Noise was removed by only considering symptoms caused by the virus (symptoms which in general match the characteristic pandemic temporal pattern). Color differentiates between hospitalizations and deaths.


The hospital records were used to generate figure 3, which illustrates the temporal characteristics of the pandemic across countries. Each plot represents a country. The plots are ordered left to right and top to bottom by date of peak hospitalization. We used the peaks as indicators of temporal sequence, as onset data was very noisy. According to this metric, the pandemic occurred first in Nairobi, Kenya. This agrees with our aforementioned hypothesis that Dr. George, from Nairobi, was a key link in spreading the disease to other countries through the Dubai meetings, as outlined in Figure 1.

Based on our knowledge of the arms dealing meetings in Dubai, it is not clear how the virus reached Kenya and Lebanon, the two first countries experiencing the pandemic outbreak. The people identified as arms dealers from these two countries first met with Nikolai in Dubai before the 19th of April when Nicolai was likely infected by Dombrovski. Therefore, we hypothize that there were additional, undocumented meetings in Dubai between infected members of the arms dealing network and those from Kenya and Lebanon, but for this we have no clear evidence. We are however fairly confident that members of the arms dealing network from Yemen, Suadi Arabia and Iran got infected by Nikolai and Salah Ahmend in Dubai, then carried the disease back to their countries of origin.




Figure 4: Geographic maps which show the countries involved in arms dealing and the pandemic. In the map on the left, color represents an interestingness metric for each country, computed from country name frequency weighed by time (more recent occurences have higher weight) across all intelligence documents. In the map on the right, countries affected by the pandemic are colored by their mortality rate.

We mapped our previous results from both the document collection and the hospitalization records to a geographic map for further analysis. These results are shown in Figure 4. The first interesting finding was that Thailand and Turkey are included in the data on the pandemic outbreak, but do not show its characteristic mortality rate. We can also observe in figure 3 that hospitalizations and deaths in Thailand and Turkey do not exhibit the characteristic pandemic curve over time. From this observation, we conclude that the virus never reached these countries. This fact agrees with our proposed path of viral spread during the Dubai meetings: though arms dealers from Thailand and Turkey were indeed present in some of the meetings (Hakan, Celik and Boonmee), they were never exposed to an infected individual according to the facts we have assembled and visualized in Figure 1.

Lessons Learned
Our visual analytics based approach was validated by the quality of our results. Nevertheless, the advantages and drawbacks of our approach should not be left undiscussed. As a guiding principle, we always aimed at involving the human as early as possible in the analytic process. This involvement was mostly supported by standard visualization techniques. Also, we tried to support this process by automatic analysis techniques. Their combination is, in our opinion, the most successful approach. However, the data, task descriptions and background information provided by the challenge committee were very diverse, complex and multimodal, making the analytic process to a true challenge. We experienced great difficulties in fitting the extracted pieces of the puzzle together. We thereby often made simplifying assumptions for the sake of convenience, some of thich which likely be unacceptable in real-world scenarios. Our theories, though partially supported by evidence and always logical, are not fully consistent. For example, we could not explain how the arms dealers from Kenya (Owiti and Otieno) got infected and carried the virus to Kenya. There is also no clear evidence what the participants of the meeting in Dubai did after the meeting; we assume that they went back to their home countries but there is no direct evidence for this. It also remains a question, who actually did participate at the meetings. We only know that they were planned. Further intelligence should be gathered about the meeting itself. It is also not supported by the intelligence, how the virus is transmitted. The medical reports should have provided this piece of information. The most problematic part of our assumption, is that we are unclear about the 6 circumstances of Dr. Ngaki (aka Dr. George), what kind of “medical project” he was working on, and how he got infected in the first place.